How Infinitely Wide Neural Networks Benefit from Multi-task Learning -- an Exact Macroscopic Characterization. (arXiv:2112.15577v3 [cs.LG] UPDATED)
In practice, multi-task learning (through learning features shared among
tasks) is an essential property of deep neural networks (NNs). While
infinite-width limits of NNs can provide a good intuition for their
generalization behavior, the well-known infinite-width limits of NNs in the
literature (e.g., neural tangent kernels) assume specific settings in which
wide ReLU-NNs behave like shallow Gaussian Processes with a fixed kernel.
Consequently, in such settings, these NNs lose their ability to benefit from
multi-task learning in the infinite-width limit. In contrast, we prove that
optimizing wide ReLU neural networks with at least one hidden layer using
L2-regularization on the parameters enforces multi-task learning due to
representation-learning - also in the limiting regime where the network width
tends to infinity. We present an exact quantitative characterization of this
infinite width limit in an appropriate function space that neatly describes
multi-task learning.
( 2
min )